Identifying Mislabeled Training Data

机译：识别错误标记的培训数据

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new approach to identifying and eliminating mislabeledtraining instances for supervised learning. The goal of this approach is toimprove classification accuracies produced by learning algorithms by improvingthe quality of the training data. Our approach uses a set of learningalgorithms to create classifiers that serve as noise filters for the trainingdata. We evaluate single algorithm, majority vote and consensus filters on fivedatasets that are prone to labeling errors. Our experiments illustrate thatfiltering significantly improves classification accuracy for noise levels up to30 percent. An analytical and empirical evaluation of the precision of ourapproach shows that consensus filters are conservative at throwing away gooddata at the expense of retaining bad data and that majority filters are betterat detecting bad data at the expense of throwing away good data. This suggeststhat for situations in which there is a paucity of data, consensus filters arepreferable, whereas majority vote filters are preferable for situations with anabundance of data.

机译：本文提出了一种新方法，用于识别和消除针对监督学习的标签错误的训练实例。这种方法的目的是通过提高训练数据的质量来提高学习算法产生的分类准确性。我们的方法使用一组学习算法来创建分类器，以用作训练数据的噪声过滤器。我们对容易出现标签错误的五个数据集评估单一算法，多数表决和共识过滤器。我们的实验表明，对于高达30％的噪声水平，滤波可以显着提高分类精度。对我们方法精度的分析和实证评估表明，共识过滤器在丢弃好数据方面是保守的，以保留坏数据为代价，多数过滤器在检测坏数据上的优势是以丢弃好数据为代价。这表明对于缺乏数据的情况，首选共识过滤器，而对于多数数据的情况，多数表决过滤器是首选。

著录项

作者
Brodley, C. E.; Friedl, M. A.;
展开▼
作者单位

展开▼
年度 2011
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Identifying mislabeled training data with the aid of unlabeled data [J] . Donghai Guan, Weiwei Yuan, Young-Koo Lee, Applied Intelligence . 2011,第3期

机译：借助未贴标签的数据识别贴错标签的训练数据
2. Identifying mislabeled training data with the aid of unlabeled data [J] . Guan D., Yuan W., Lee Y.-K., Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2011,第3期

机译：借助未贴标签的数据识别贴错标签的训练数据
3. Identifying Mislabeled Training Data [J] . Brodley C. E., Friedl M. A. The Journal of Artificial Intelligence Research . 1999,第7期

机译：识别贴错标签的训练数据
4. Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data [C] . Brodley, C.E., Friedl, Geoscience and Remote Sensing Symposium, 1996. IGARSS '96. 'Remote Sensing for a Sustainable Future.', International . 1996

机译：通过识别和消除训练数据中标记错误的观测值来改善自动土地覆盖图的绘制
5. Evaluation of Synthetic Training Data and Training-Data-Augmentation Techniques for Object Detection in Ground-Penetrating Radar Data using Deep-Learning Models [D] . Ruggiero, Jean. 2021

机译：使用深度学习模型评估用于地面穿透雷达数据的对象检测的综合训练数据和训练数据增强技术
6. Identifying mislabeled and contaminated DNA methylation microarray data: an extended quality control toolset with examples from GEO [O] . Jonathan A. Heiss, Allan C. Just 2018

机译：识别标签错误和受污染的DNA甲基化微阵列数据：扩展的质量控制工具集包含GEO的示例
7. Improving Automated Land Cover Mapping by Identifying and Eliminating Mislabeled Observations from Training Data [O] . C. E. Brodley, M. A. Friedl 1996

机译：通过识别和消除训练数据中标记错误的观测值来改进自动土地覆盖图
8. Relating Personnel and Training Resources to Unit Performance: Identifying Data on Performance in the Military [R] . Hammon, C. P., Horowitz, S. A. 1987

机译：将人员和培训资源与单位绩效联系起来：确定军队绩效数据

Identifying Mislabeled Training Data

摘要

著录项

相似文献

相关主题

期刊订阅